Exploiting contextual information for prosodic event detection using auto-context

نویسندگان

  • Junhong Zhao
  • Wei-Qiang Zhang
  • Hua Yuan
  • Michael T. Johnson
  • Jia Liu
  • Shanhong Xia
چکیده

Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are important aspects of speech. In the field of prosodic event detection, many local acoustic features have been investigated; however, contextual information has not yet been thoroughly exploited. The most difficult aspect of this lies in learning the long-distance contextual dependencies effectively and efficiently. To address this problem, we introduce the use of an algorithm called auto-context. In this algorithm, a classifier is first trained based on a set of local acoustic features, after which the generated probabilities are used along with the local features as contextual information to train new classifiers. By iteratively using updated probabilities as the contextual information, the algorithm can accurately model contextual dependencies and improve classification ability. The advantages of this method include its flexible structure and the ability of capturing contextual relationships. When using the auto-context algorithm based on support vector machine, we can improve the detection accuracy by about 3% and F-score by more than 7% on both two-way and four-way pitch accent detections in combination with the acoustic context. For boundary detection, the accuracy improvement is about 1% and the F-score improvement reaches 12%. The new algorithm outperforms conditional random fields, especially on boundary detection in terms of F-score. It also outperforms an n-gram language model on the task of pitch accent detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Contextual Information from Event Logs for Personalized Recommendation

Nowadays, recommender systems are widely used in various domains to help customers access to more satisfying products or services. It is expected that exploiting customers’ contextual information can improve the quality of recommendation results. Most earlier researchers assume that they already have customers’ explicit ratings on items and each rating has customer’s abstracted context (e.g. su...

متن کامل

Enriching machine-mediated speech-to-speech translation using contextual information

Conventional approaches to speech-to-speech (S2S) translation typically ignore key contextual information such as prosody, emphasis, discourse state in the translation process. Capturing and exploiting such contextual information is especially important in machine-mediated S2S translation as it can serve as a complementary knowledge source that can potentially aid the end users in improved unde...

متن کامل

Syllable-level prominence detection with acoustic evidence

Accurate prominence annotation benefits many spoken language understanding tasks as well as speech synthesis. In this work, we conduct a thorough study using acoustic prosodic cues for prominence detection in speech. This study is different from previous work in several aspects. In addition to the widely used prosodic features, such as pitch, energy, and duration, we introduce the use of cepstr...

متن کامل

Parsing and Spoken Structural Event Detection

This report describes research conducted by the Parsing and Spoken Structural Event Detection (PaSSED) team as part of the 2005 Johns Hopkins Summer Workshop on Language Engineering. This project investigated the interaction between parsing and the detection of structural metadata in conversational speech, including sentence boundaries, edits (the reparandum portion of speech repairs), and fill...

متن کامل

The contextual map: detecting and exploiting affinity between contextual information in context-aware mobile environments

Context-aware computing generally focuses on abstracting the situation of individual entities, such as persons, places and objects, making this information available for further computational exploitation. Those resulting entities’ contexts allow a wide spectrum of application cases in various domains, foremost in mobile computing and internet applications. With contexts from multiple entities ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Audio, Speech and Music Processing

دوره 2013  شماره 

صفحات  -

تاریخ انتشار 2013